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Abstract. Consider a balanced non triangular two-color Polya- Eggenberger urn process, 
assumed to be large which means that the ratio a of the replacement matrix eigenvalues 
satisfies 1/2 < cr < 1. The composition vector of both discrete time and continuous time 
models admits a drift which is carried by the principal direction of the replacement matrix. 
In the second principal direction, this random vector admits also an almost sure asymptotics 
and a real-valued limit random variable arises, named W^^ in discrete time and W^'^ in 
continous time. The paper deals with the distributions of both W . Appearing as martingale 
limits, known to be nonnormal, these laws remain up to now rather mysterious. 

Exploiting the underlying tree structure of the urn process, we show that W^'^ and W'"'^ 
are the unique solutions of two distributional systems in some suitable spaces of integrable 
probability measures. These systems are natural extensions of distributional equations that 
already appeared in famous algorithmical problems like Quicksort analysis. Existence and 
unicity of the solutions of the systems are obtained by means of contracting smoothing 
transforms. Via the equation systems, we find upperbounds for the moments of W^'^ and 
W'"'^ and we show that the laws of W^'^ and W^^ are moment-determined. We also prove 
that their densities are not bounded at the origin. 
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1 Introduction 

Polya urns provide a rich model for many situations in algorithmics. Consider an urn that 
contains red and black balls. Start with a finite number of red and black balls as initial 
composition (possibly monochromatic). At each discrete time n, draw a ball at random, 
notice its color, put it back into the urn and add balls according to the following rule: if 
the drawn ball is red, add a red balls and h black balls; if the drawn ball is black, add c red 
balls and d black balls. The integers a, 6, c, d are assumed to be nonnegativ^ Thus, the 
replacement rule is described by the so-called replacement matrix 




"Drawing a ball at random" means choosing uniformly among the balls contained in the 
urn. That is why this model is related to many situations in mathematics, algorithmics 
or theoretical physics where a uniform choice among objects determines the evolution of a 
process. See Johnson and Kotz's book [18], Mahmoud's book [21] or Flajolet et al. [16] for 
many examples. 

^One admits classically negative values for a and d, together with arithmetical conditions on c and b. 
Nevertheless, the paper deals with so-called large urns, for which this never happens. 
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In the present paper, the urn is assumed to be balanced, which means that the total number 
of balls added at each step is a constant 

S = a + b = c + d. 

The composition vector of the urn at time n is denoted by 

jjDT _ { number of red balls at time n \ 
\y number of black balls at time n J 

Two main points of view are classically used on this random vector. The forward point of 
view consists in considering the composition vector sequence {U^'^ {nf) ^^^^ as a N^- valued 
Markov chain. The information on the successive configurations is thus concentrated in 
a global object: the random process, giving access to probabilistic tools like martingales, 
embedding in continuous time, branching processes. A vast part of the literature on Polya 
urns relies on such probability tools, dealing most often with natural extensions of the model 
to a random replacement matrix or to an arbitrary finite number of colors. The forward 
point of view is particularly efficient to get results on the asymptotics of the process. See 
for instance Janson's seminal paper [T7] or [30] for an extensive state of the art on such 
methods. 

Alternatively, a natural feature consists in using the recursive properties of the random 
structure through a divide and conquer principle. This is the backward point of view. Applied 
to generating functions, it is the base tool for analytic combinatorics methods, developed 
in Flajolet et al. papers [l5l[T6]. Expressed in terms of the random process, the backward 
approach leads to dislocation equations on limit distributions that can already be found in 
a wide generality in Janson [17]; these equations are further developed in [11] for two-colo 
urns and in [TUl [U] for the urn related to m-ary search trees as well. 

In order to state our results and also the asymptotic theorems they are based on, we first 
give some notations that are made more complete in Section [2j The eigenvalues of the 
replacement matrix R are S and the integer 

m:=a — c = d — b 



and we denote by 

m 

the ratio between these eigenvalues. The particular case a = 1 is the original Polya urn 
(see Polya [29]); this process has a specific well known asymptotics with a random drift. In 
appendix, our Section [6] is devoted to gather results on this almost sure limit and on the 
asymptotic Dirichlet distribution as well. When o" < 1, it is well known that the asymptotics 
of the process has two different behaviours, depending on the position of a with respect to 
the value 1/2 (see Athreya and Karlin |;4J for the original result, Janson p!7] or jSQ] for the 
results below). Briefly said. 
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(i) when a < ^, the urn is called small and, except when R is triangular, the composition 
vector is asymptotically Gaussian]^ 



U (n) — nvi 



V 



n 



^(o,s^) 



where vi is a suitable eigenvector of *i? relative to S and Q a centered Gaussian vector with 
covariance matrix that has a simple closed form; 

(a) when ^ < a < 1, the urn is called large and the composition vector has a quite different 
strong asymptotic form: 

U^^ (n) =nvi+n''W^^V2 + o{n'') (1) 

where f i, f 2 are suitable (non random) eigenvectors of *i? relative to the respective eigenvalues 
S and m, VT^^ is a real-valued random variable arising as the limit of a martingale, the little 
o being almost sure and in any L^,j9 > 1. 

Classically, like for any Markov chain, one can embed the discrete time process {U^^{n)y^^^ 
into continuous time. In the case of Polya urns having a replacement matrix with nonnegative 
entries, this defines a two-type branching process 

A similar phase transition occurs when t tends to infinity: for small urns, the process U'"'^ has 
a (random) almost sure drift and satisfies a gaussian central limit theorem (see Janson [T7]). 
When the urn is large, the asymptotic behaviour of the process, when t tends to infinity, is 
given by 

U""^ (t) = e^'^vi (1 + 0(1)) + e"''W^^V2 (1 + o(l)) , 

where ^ is Gamma-distributed, W'"'^ is a real-valued random variable arising as the limit of a 
martingale, the little o is almost sure and in any LP,p > 1, the basis (fi,f2) of deterministic 
vectors being the same one as in ([T|. These asymptotic results are more detailed in Section [2] 
Because of the canonical link between f/^^ and U'"^ via stopping times, the two random 
variables W^^ and W'"'^ are related by the so-called martingale connexion as explained in 
Section 2.3[ Consequently any information about one distribution is of interest for the other 



one. All along the paper, the symbol DT is used to qualify discrete-time objects while CT 
will refer to continuous-time ones. 

In this article, we are interested by large urns. More precisely, the attention is focused on 
the non classical distributions in W^'^ and W'"'^ when the replacement matrix R is not 
triangular (i.e. when 6c 7^ 0). For example, W^*^^ is not normally distributed, which can be 
seen on its exponential moment generating series that has a radius of convergence equal to 
zero, as shown in [11] (see Section Is] for more details). Because of the martingale connexion. 



^The case cr = 1/2 is similar to this one, the normahsation being y/ n log n instead of ^/n. 
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this implies that W^'^ is not normal either. Our main goal is to get descriptions of these 
laws (density, moments, tail, . . . ). 

What is already known about W^'^ or In [TT], the Fourier transform of W^'^ is 

"explicitely" calculated, in terms of the inverse of an abelian integral on the Fermat curve 
of degree m. The existence of a density with respect to the Lebesgue measure on M and 
the fact that W'~^^ is supported by the whole real line are deduced from this closed form. 
Nevertheless, the order of magnitude of the moments and the question of the determination 
of the law by its moments remained open questions. The shape of the density was mysterious, 
too. The present paper answers to these questions in Section [5] and |2.4| respectively. 



In the present text, we exploit the underlying tree structure of a Polya urn. Governing both 
the backward and the forward points of view, it contains a richer structure than the plain 
composition vector process. Section |3] is devoted to highlighting this tree process and to 
derive decomposition properties on the laws of the composition vector at finite time. These 



decompositions directly lead to distributional fixed point systems (15) and (18) respectively 
satisfied by W^^ and W'"'^, as stated in Theorem [s] and Theorem 6^ 

With a slightly different approach, Knape and Neininger [20] start from the tree decomposi- 



tion of the discrete Polya urn and establish the fixed point system (15) with the contraction 
method tools developed in Neininger- Riischendorf [26]. This complementary point of view 
does not take advantage of the limit random variable W^'^ but applies for small and large 
urns together, allowing to find limit Gaussian distributions thus providing an alternative 
method to the embedding method used by Janson in [17J. 

Sometimes called fixed point equations for the smoothing transform or just smoothing equa- 
tions in the literature (Liu [23], Durrett-Liggett [13j) ), distributional equations of type 

X = (2) 

have given rise to considerable interest in, and literature on. For a survey, see Aldous- 
Bandyopadhyay [1]. In theoretical probability, they are of relevance in connexion with 
branching processes (like in Liu [21], Biggins- Kyprianou [7], Alsmeyer et al [2]) or with 
Mandelbrot cascades (Mandelbrot [25], Barral [6]). They occur in various areas of applied 
probability, and also on the occasion of famous problems arising in analysis of algorithms, 
like Quicksort (Rosier plj). They are naturally linked with the analysis of recursive algo- 
rithms and data structures (Neininger- Riischendorf [27], surveys in Rosier- Riischendorf |32j 
or Neininger- Riischendorf [28] ) 

Most often, in Equation ([2]), the Ai are given random variables and the X^^^ are independent 



copies of X, independent of the Ai as well. Our System (18) with unknown real- valued 
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random variables (or distributions) X and Y is the following: 



/ a+l 5+1 s 

^ k=l k=a+2 ^ 

/ c 5+1 s 

^ fc=l fc=c+l ^ 



where \J is uniform on [0,1], X^^^ and y'^'^^ are respective copies of X and y, all being 
independent of each other and of \J . Our System (15) for the discrete time limit W^'^, 
slightly more complicated, is essentially of the same type. These systems can be seen as 
natural generalizations of equations of type (|2|), as set out in Neininger-Riischendorf [26]. 



Section |4] is devoted to the existence and the unicity of solutions of our systems by means of 
a contraction method (Theorems [t] and |8|, leading to a characterization of W^^ and W'"'^ 
distributions. 

Finally, in Section |5| we take advantage of the fixed point systems again to give accurate 
bounds on the moments of W'-^'^ (Lemma [3|. Using this lemma, we show that the laws of 
W^^ and W'-'"^ are determined by their moments (Corollary to Theorem [o]). 



2 Two-color Polya urn: definition and asymptotics 
2.1 Notations and asymptotics in discrete time 

Consider a two-color Polya-Eggenberger urn random process. We adopt notations of the 
introduction: the replacement matrix = ( ^ ) is assumed to have nonnegative entries. 



c dy 

the integers S as balance and m as second smallest eigenvalue. We assume R to be non 
triangular, i.e. that be ^ 0; this implies that m < S — 1. Moreover, the paper deals with 
large urns which means that the ratio a = m/ S is assumed to satisfy 
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We denote by Vi and f 2 the vectors 



they are eigenvectors of the matrix respectively associated with the eigenvalues S and m. 
Let also {ui,U2) be the dual basis 

1 1 

ui{x,y) = -{x + y) and U2{x,y) = -{bx - cy); (4) 
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Ml and U2 are eigenforms of respectively associated with the eigenvalues S and m. 

When the urn contains a white balls and /3 black balls at (discrete) time 0, the composition 
vector at time n G N is denoted by 



DT 



[n). 



Since the urn is assumed to be large, the asymptotics of its composition vector is given by 
the following result. 

Theorem 1 (Asymptotics of discrete time process, |17|, I30| ) 

Let ( UP^g){n) ) be a large Polya urn discrete time process. Then, when n tends to infinity, 



tDT 



(5) 



where vi and V2 are the non random vectors defined by is the real-valued random 

variable defined by 

lim ^«2(f/S)H) (6) 



71— s>+oo n" 



U2 being defined in and where o( ) means almost surely and in any L^,p > 1. 



A proof of this result can be found in Janson [T7j by means of embedding in continuous 
time. Another one that remains in discrete time is also given in [SO]. The present paper is 
focused on the distribution of ^{^p) which appears in both proofs as the limit of a bounded 
martingale. One remarkable fact that does not occur for small urns {i.e. when a < 1/2) is 
that the distribution of actually depends on the initial composition vector (a,/3). For 

example, its expectations turns out to be 



r(^ 



) ba — c/3 



r(^ + a) s 



(7) 



This formula, explicitely stated in [TT] can be shown by elementary means or using the 
convergent martingale 

/ \ 



n 



1+ 



k + 



\0<fc<n-l 

For more developments about this discrete martingale which is the essential tool in the 
discrete method for proving Theorem [l| see 



The approach in analytic combinatorics makes easy to compute the probability generating 
function of the number of (say) red balls in the urn at finite time, by iteration of some 
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(«,/?) = (1,0) 



(«,/?) = (1,1) 



-20 20 



(«,/?) = (0,1) 



Figure 1: starting from initial composition {a, (3), exact distribution of the number of red 
balls after n = 300 drawings, centered around its mean and divided by n'^ . 



suitable partial differential operator. The treatment of Polya urns by analytic combinatorics 
is due to P. Flajolet and his co-authors and can be found in [IS]. Figure [l] is the exact 
distribution of the (normalized) number of red balls after 300 drawings, centered around its 
expectation. The computations have been managed using Maple and concern the (large) 

urn with replacement matrix R = \ ^ ^ ^ ) and respective initial compositions (1, 0), (1, 1) 



^3 17^ 
and (0,1). 

Some direct first observations can be made on these pictures. For example, one gets an 
illustration of the decomposition formula (12) which states that the distribution of f/(i,i) is 



decomposed as a weighted convolution of ?7(i,o)'s and f/(o,i)'s- 



2.2 Embedding in continuous time 

Classically, the discrete time process is embedded in a continuous time multitype branching 
process; the idea of embedding discrete urn models in continuous time branching processes 
goes back at least to Athreya and Karlin and a description is given in Athreya and Ney [S] , 
Section 9. The method has been revisited and developed by Janson and we summarize 
hereunder the results obtained in [11]. 

We define the continuous time Markov branching process 

(c'<S,(*)),«,. 

as being the embedded process of ( UP'^o-. (n) ] . It starts from the same initial condi- 
tion VfJa^W) = U(^s)i^) ~ ('^5/^); any moment, each ball is equipped with an Sxp{l)- 
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distributee^ random clock, all the clocks being independent. When the clock of a white ball 
rings, a white balls and h black balls are added in the urn; when the ringing clock belongs 
to a black ball, one adds c white balls and d black balls, so that the replacement rules are 
the same as in the discrete time urn process. 

The important benefit of considering such a process comes from the independence of the 
subtrees in the branching process. In the continuous-time urn process, each ball reproduces 
independently from the other balls. The asymptotics of this process is given by the following 
theorem. 

Theorem 2 (Asymptotics of continuous time process, | |17L 111] ) 

Let (^Ul^i^^it)^ be a large Polya urn continuous time process. Then, when t tends to 
infinity, 

f^S) (^) = ^''^^^ (1 + + (1 + 0(1)) , (8) 

where Vi,V2,Ui,U2 are defined in ^ and ^ and are real-valued random variables 

defined by 

^= lim e-'^V (f/^^W) , 



t— s>+oo 

all the convergences are almost sure and in any -space, p > I. Furthermore, ^ is Gamma (^^) 
distributed. 

Here again, the distribution of W'^'^ depends on the initial composition of the urn. For 
exemple, its expectation is 

as can be seen from the continuous-time martingale 

Some properties of W^^ are already known. For example, it is supported by the whole 
real line M and admits a density. Moreover, this density is increasing on ]R<o, decreasing 
on ]R>o and is not bounded in the neighbourhood of the origin. Note that it is not an 
even function since W'"^ is not centered. Finally, the characteristic function of W'"'^ [i.e. 
its Fourier transform) is infinitely differentiable but not analytic at the origin: the domain 
of analyticity of Eexp (^zW'"'^) is of the form C \ L+|JL_ where L+ and L_ are half- 
lines contained in M, one of them being bordered at the the origin. In particular, the 
exponential moment generating series of W'"'^ has a radius of convergence equal to zero, due 
to a ramification and a divergent series phenomenon as well. All these properties are shown 
in [Hj , based on the expression of this characteristic function in terms of the inverse of an 
abelian integral on the Fermat curve x"^ -\- y"^ -\- z"^ = 0. 



For any positive real a, £xp{a) denotes the exponential distribution with parameter a. 



9 



2.3 Connexion discrete time/ continuous time 

As in any embedding into continuous time of a Markov chain, the discrete time process and 
the continuous time one are connected by 

(f""(-»))„.«=(c''" (")).« 

where 

= To < ri < • • ■ < 7:„ < ■ ■ ■ 

are the jumping times of the continuous process. These random times are independent of 
the positions t/'^^(r„). The embedding for urn processes is widely studied in Janson |T7] . 
It is detailed in [IT] in the special case of two-color Polya urns. A dual formulation of this 
connexion is 

where 

n{t) := inf{n > 0,r„ > t} 

is the number of drawings in the urn before time t. After projection and normalization, these 
equalities provide two dual connexions between the limit variables and W^^y. 

W^S) = ■ ^(S) and W(i^p) ^ r'^ ■ (10) 

where ^ and the Vr(a,/3)'s are independent in both equalities, C, being Gamma (^^) dis- 
tributed. 



2.4 Shape of densities 



The observations made in Section 2.1| on Figure [T] can be seen as a first approximation of the 



shape of the density of If such a density exists! It is indeed the shown in [llj , 

the law of turns out to be absolutely continuous with regards to Lebesgue measure 

~\T'/~\T-\CiT*-i~T r TO ri tz^rin r^tz^ri Trw 1/1/ 

(",/9) 



on M. The same property is deduced for WP^a^ from the martingale connexion Formula (10) 



Theorem 3 The densites of W^^^s^ and are infinitely dijjerentiable on M \ {0}, in- 

creasing on ] — oo, 0[, decreasing on ]0, -|-oo[. Furthermore, if f denotes any of these densities, 
there exists a positive constant Cf such that 



fix)> 



Cf 



\x\ 



in a neighbourhood of the origin. 
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Proof. The properties are shown in [TT] in continuous time (Proposition 7.2 and its proof). 
We deduce the results in discrete time from connexion Formula (10). Let fcx (resp. for) 
denote the density of X'"^ := Wl^'^^ (resp. X^'^ := W^'^-^). These laws are related by the 

connexion X^^ = ^~'^X^'^ where ^ is Gamma(l/S')-distributed and independent of X'-''^ . 
Consequently, for any bounded nonnegative function 

1 [s) Jo 



1 



(^£^ fcTiyry^^-'e-'d?j dy. 



r(l) 

Consequently, almost everywhere, 

1 r+ca 

foAy) = ZTTTY / fcTiynf-^^-'e-'dt. 
i [s) Jo 

We know from [H] that there exists a positive constant Cf such that for any x G [—1,1], 



Cf 



\x\ 



a-- 



When < \y\ < 1, split the integral above into two parts depending whether {ylt"' < 1 or 
not; this implies that 

fDTiy) >^ r " {\y\n--^t^^'^-\-'dt 



■r(|) h 



Cf n( \ \ 



r(|) 



TYC(y) \y\ 



with C{y) = / ts-^e'^dt. Since C satisfies < C(l) < C{y) < T{2/S) for any nonzero 
Jo 

y G [—1,1], the result is shown. 



3 Decomposition properties 

This section emphasizes the underlying tree structure of the urn process. This obvious vision 
is indeed the key in the following two decompositions: first, we reduce the study of W(a,i3) to 
the study of ^^(i^o) and ^^(0,1), called later on X and Y respectively, to lighten the notations. 



Second, in Section 3.2, we exploit a "divide-and-conquer" property to deduce a system of 



fixed point equations on X and Y. The reasoning is detailed in discrete time. It is much 
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more straightforward in continuous time, since the decomposition is contained inside the 



branching property. Detailed in [TT], the continuous case is briefly recalled in Section 3.3 



The natural question "is it possible to deduce the DT-system from the CT-system and 
conversely" is adressed in Section 3.4[ 



3.1 Tree structure in discrete time 

In this section dealing with the discrete time process, we skip the index DT when no confusion 
is possible. 

Let us make precise the tree structure of the urn process: a forest (7^) grows at each drawing 
from the urn. At time the forest is reduced to a red nodes and /3 black nodes, which are 
the roots of the forest trees. At time n, each leaf in the forest represents a ball in the urn. 
When a leaf is chosen (a ball is drawn), it becomes an internal node and gives birth to (a + 1) 
red leaves and b black leaves, or c red leaves and {d + 1) black leaves, according to the color 
of the chosen leaf. 

The dynamics of the urn process was described saying "at each time n, a ball is uniformly 
chosen in the urn" . It becomes "a leaf is uniformly chosen among the leaves of the forest" . 
This forest therefore appears as a non binary colored generalization of a binary search tree. 

For example, take the following urn with ~ ^ 2 5 ^ replacement matrix (it is a large 

urn) and start from a = 3 red balls and /3 = 2 black balls. Below is a possible configuration 
after 3 drawings. 




Initial red balls are numbered from 1 to a and initial black balls from {a + 1) to {a + /3). 
The following figure represents the forest coming from these initial balls. 
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a 



o o o o o 

ii ii i 

For any n > and A; G {1, . . . , a + /3}, denote by Dk{n) the number of leaves of the A;-th tree 
in the forest at time n. Thus, at time n, the number of drawings in the fc-th tree is ^^^Miii. 
This numbers represents the time inside this /c-th tree. 

Remember that the balls of the whole urn are uniformly drawn at any time and notice 
that at each drawing in the fc-th tree, Dk{n) increases by S: the random vector D{n) = 
{Di{n), . . . , Da+i3{n)) has exactly the same distribution as the composition vector at time n 
of an [a + /3)-color Polya urn process having Sla+p as replacement matrix and (1, . . . , 1) as 
initial composition vector. 

Gathering these arguments, the distribution of U(^a,i3){n) can be described the following way: 
consider simultaneously 

(i) an original (a + /3)-color urn process D = {Di, . . . ^Da+p) having SIa+i3 as matrix re- 
placement and (1, . . . , 1) as initial condition; 

(ii) for any k G {1,. ..,«}, an urn process ^(^f q) having R as replacement matrix and (1, 0) 
as initial condition; 

(Hi) for any k G {a + 1, . . . ,a + /3}, an urn process i) having R as replacement matrix 
and (0, 1) as initial condition, 

all these processes being independent of each other. Then, the process f/(a,/3) = (f^(a,/3)(^)) 

(k) (k) 

has the same distribution as the process defined by the sum of the q) ^^'^ '^^ ^^e i) 
respective times ^^*iMiii. In other words, for any n > 0, 

a a+13 
k=l k=a+l 

where the U(i^q) and the U^q\) are respective copies of the random vector processes ^/(i,o) and 
?7(o,i), all being independent of each other and of D. 

The following claim is a direct consequence of Proposition [2] in Section |6} 

Claim When n goes off to infinity, ^ {Di{n), . . . .Da+pin)) converges almost surely to a 
Dirichlet . . . , - distributed random vector, denoted by Z = {Zi, . . . , Za+/3)- 
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Notice that for any k, Dk{n) tends almost surely to +00 when n tends to infinity. Starting 
from Equation (11), dividing by n'^, taking the image by the second projection U2 (notations 
of Section 2.1) and passing to the (almost sure) limit n — )■ 00 thanks to Theorem [l| one 
obtains the following theorem. 

Theorem 4 For any {a, (3) G \ (0,0), let W(a,i3) be the limit distribution of a large two- 
color discrete time Polya urn process with ratio a and initial condition [a, (3). Then, 

fc=l k=a+l 

where 

(i) Z = [Zi, . . . , Za+p) is a Dirichlet distributed random vector, with parameters (|;, . . . , ^),- 

(11) the Wl^y^ and the W[^\.^ are respective copies 0/1^(1,0) one? 1^(0,1)? M being independent 
of each other and of Z . 

Notice that any Z^ is Beta{-^, a+is-i )_(jigtributed (see Section [6|. 



3.2 Discrete time fixed point equation 



Theorem |4] shows that the limit distribution of a large urn process starting with any initial 
composition can be written as a function of two "elementary" particular laws, namely the 
laws of W^(^o) ^(0*1)- "^^^ present section gives a characterisation of these two distribu- 
tions by means of a fixed point equation. 

Let {U{n))n>o be a two-color Polya urn process, with all the notations of Section 2.1 In 
order to simplify the notations, denote 



X:=WP,l= hm /■^(^•°)(^) 



Y := WS^. = lim U2 



Uio,i){n] 



(13) 



Focus now on the study of ?7(i^o)(^)- At time 1 the composition of the urn is deterministic: 
there are (a + 1) red balls and b black balls. Exactly like in Section 3.1, the tree structure of 



the urn appears, with a forest starting from (a + 1) red balls and b black balls. In the same 
example with replacement matrix R = \ ^ 5 ' ' ^^^^ ^^^^ illustrated by the following 



figure: 
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For any n > 1, denote by Jk{n) the number of leaves at time n of the k-th subtree. Then 
at time n, the number of drawings in the k-th subtree is ■^'^^'^J'^ so that, as in Section 
one gets the equation in distribution 

Uim in) ^ X: (^) + E <1) (^) (14) 



3.1 



fc=i 



k=a+2 



where the g) and the U^^'i^ are respective copies of the random vector processes f/(i,o) 
and ^7(0,1)5 3.11 being independent of each other and of the J^'s. Besides, the random vector 
{Ji{n), . . . , Js+i{n)) is exactly distributed like the composition vector at time (n — 1) of an 
[S + l)-color Polya urn process having SIs+i as replacement matrix and (1, . . . , 1) as initial 
composition vector, so that, by Proposition [2] in Section [6| 



^ (jiH, . . . , Js+i{n)) ^V = {Vi,..., Vs+i) 



nS 

almost surely, the random vector V being Dirichlet (|^, . . . , ^) -distributed. Like in Sec- 
tion 3.1, divide Equation (14) by n", take the image by the second projection U2 and pass 



to the limit n — t- oo using Theorem [Tj This leads to the following theorem. 



Theorem 5 ^45 defined just above by (13), let X and Y be the elementary limit laws of a 

a b 



large two-color discrete time Polya urn process with replacement matrix 



S = a -\- b 
system 



c + d and ratio a > ^ . 



c d 



, balance 



Then, X and Y satisfy the distributional equations 

5+1 

X = v^x^^^ + v^y'^^^ 



a+1 



k=l 



k=a+2 
5+1 



(15) 



Y = VkX^''^ + Y ^kY^^^ 



k=l 



k=c+l 
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where 

(i) V = (Vi, . . . , Vs+i) is a Dirichlet distributed random vector, with parameters (|, . . . , |); 

(ii) the X^'^) and the F^^'^ are respective copies of X and Y , all being independent of each 
other and of V . 

Notice that any Vk is distributed like a random variable , U being uniformly distributed 



on [0, 1]. Equivalently, is distributed like f/™ (notations of Section 2.1). 



3.3 Decomposition properties in continuous time 

Remember that {U'"^{t))^ is a continuous time branching process. Thanks to the branching 
property, the decomposition properties of this process are somehow automatic. First, 

Ugl) it) = [«] f/S) it) + U^ol) it) , 

where the notation [n\X means the sum of n independant random variables having the same 
distribution as X. Consequently, passing to the limit when t — )• +oo after normalization and 
projection yields 

W^(S) = NW^S + [/3]W^(o,T)- (16) 
This convolution formula expresses how the limit law W'""^ is decomposed in terms of el- 
ementary limit laws W^(to) ^(o^T)- corresponds to the discrete time decomposition 
shown in Theorem IH 

Now start from one red ball or from one black ball, and apply again the branching property 
at the first splitting time. As before, define X'-'"^ and by 

(17) 

Then, with the above Theorem [2| one gets the following result. 

Theorem 6 ( \17\ 111] ) Let X = X*^^ and Y = Y^^ be the elementary limit laws of a large 

two-color continuous time Pdlya urn process with replacement matrix ^ ^ ' balance 

S = a + b = c + d and ratio a > |, as defined just above by (17). Then, X and Y satisfy the 
distributional equations system 

a+l S+1 



X 4 f/-('^x(^')+ J2 yw") 

V k=l k=a+2 ^ 

^ k=\ fc=c+l ^ 



c+1 
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where U is uniform on [0,1], where X, X^''^ and Y, Y^'^^ are respective copies of X'^^ and 
Y^'^ , all being independent of each other and of U . 

Remark 1 As mentioned above, it is shown in [77]/ that X^'^ (and Y^'^) admit densities. 
The proof is based on the computation of the Fourier transform of X^^ in terms of the 
inverse of an abelian integral on a Fermat curve. This method is specific to 2-color urn 
processes. Theorems\5\and\^give a new way of proving this fact by means of techniques that 
can be adapted from Liu's method (see for example). This alternative method provides 
a perspective (adressed in a forthcoming paper): it can be applied to show that the limit laws 
of d- color large urns admit densities as well. 



3.4 Connexion between continuous- time and discrete- time sys- 
tems 



In Section 2^, we described the connexion between the hmit laws of large urns in discrete 
and continuous time, called the martingale connexion. It was seen as a consequence of the 
embedding into continuous time of the initial discrete time Markov chain defining the urn 



process. In this paragraph, we show how the solutions of fixed point systems (15) and (18) 
are related. Since these systems characterize the urn limit laws (as proved in Section |4]), this 
provides an alternative point of view on the martingale connexion. 



Proposition 1 (i) Let X and Y be solutions of (15) and let ^ be an independent Gamma 



distributed random variable with parameter ^ . Then, ^'^X and C,^Y are solutions of ( 18 ) . 



(a) Conversely, let X and Y be solutions of (18) and let ^ be an independent Gamma- 



distributed random variable with parameter ^. Then, ^ '^X and ^ "Y are solutions of (15) 



The assertions of Proposition [T] are particular cases of the following Lemma which is an 
elementary result in probability theory. 

Lemma 1 Consider the two following distributional equations with unknown real-valued ran- 
dom variables X, Xi, . . . , Xs+i. 

1- Equation D: 

X ^ ^^^^ 

1<A:<5+1 

where V = (Vi, . . . , V5+1) is a Dirichlet- distributed random vector with parameter (|;, . . . , ^) . 

2- Equation C: 

X =V'' J2 

l<k<S+l 

where V is a Beta-distributed random variable with parameter (^, l) (in other words, V^^^ 
is uniformly distributed on [0, 1]/ 

Let X, Xi, . . . , Xs+i be real-valued random variables. 
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(i) If X,Xi, . . . ,Xs+i satisfy Equation D and if . . . ,^s+i (if^^ i.i.d. Gamma (^-^^ - 
distributed random variables, then ^"X,^^Xi, . . . ,^g^^Xs+i satisfy Equation C. 

(a) Conversely, ifX, Xi, . . . , Xs+i satisfy Equation C and if^, ^i, . . . , ^s+i o^re i.i.d. Gamma ( 
distributed random variables, then ^~'^X, ^f'^Xi, . . . .,^g1iXs+i satisfy Equation D. 

Proof, (i) Suppose that X = X]i<fc<5+i ^k-^k- Since a Dirichlet random vector can be 
seen as independent Gamma-distributed random variables conditioned to have a sum equal 
to 1 (see Section IgI), one can write 



Since + ■ ■ ■ + ^5+1 is Gamma (l + |) -distributed, the quotient 1^^^^,^,^^^^^-^ is Beta l)- 
distributed, leading to the result. The reciprocal result (ii), of the same vein, is left to the 
reader. ■ 



4 Smoothing transforms 

This section is devoted to the existence and the unicity of solutions of the distributional 
systems (15) and (18). By Proposition [T] just above, it is sufficient to deal with only one 
of them. Notice that existence and unicity of solutions of the discrete-time system (15) 
could be deduced from the general result in Neininger-Riischendorf |26j, nevertheless we 
give hereunder a rapid and autonomous proof of Theorem [7| in order to make explicit the 
contraction method in the case of large Polya urn. The proof is reminiscent of the one in 
Fill-Kapur [H]. 

When y4 is a real number, let M.2 (^) be the space of probability distributions on M that 
have A as expectation and a finite second moment, endowed with a complete metric space 



structure by the Wasserstein distance. Note first that when X and Y are solutions of (15) 



or (18) that have respectively B and C as expectations, then cB + bC = (elementary 



computation). In Theorems and ^ we prove that when B and C are two real numbers 



that satisfy cB + bC = 0, the systems (15) and (18) both have a unique solution in the 
product metric space A^2 (B) x A^2 (C). To do so, we use the Banach contraction method. 

Since (EX, KY) is proportional to (6, — c) in both continuous time and discrete time urn 
processes (Formulae ([t]) and ([9])), this result shows that the systems (15) and (18) characterize 
the limit distributions W^^s^ and on one hand, W^J^^ and Wg^^ on the other hand. 

4.1 The Wasserstein distance 

Let A G M. The Wasserstein distance on A^2 (^) is defined as follows: 



dw (fJ^i, ^12) = min (E(Xi-X2^^ 

(Xi,X2) V 



1/2 
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where the minimum is taken over random vectors (Xi, X2) on having respective marginal 
distributions /ii and fi2 the minimum is attained by the Kantorovich-Rubinstein Theorem. 
With this distance, A^2 (^) is a complete metric space (see for instance Dudley P^). 
Let {B, C) e M^. The product space A^2 (B) x Ai2 (C) is equipped with the product metric, 
defined (for example) by the distance 

d(^{lii,ui) ,{fi2,i^2)^ =max|(iH/(/ii,/X2),(iiy (z^i,z^2) |- 
Of course, this product remains a complete metric space. 



4.2 Contraction method in discrete time 



Let us recall the fixed point system (15) satisfied by {X^^ ,Y^^), the elementary limits of a 
large two-color discrete time Polya urn process: 



a+l 



S+1 



k=l 



k=a+2 
S+1 



k=l 



k=c+l 



Let A^2 be the space of square-integrable probability measures on M. When {B,C) G M^, 
let Ki be the function defined on M2 {B) x M2 (C) by: 



Ki : M2 {B) X M2 {C) 



'a+l 



S+l 



I > 



c J2 ^k^^'^ + E ^^^^ 



(fc) 



where X(i),...,X('^+i) are //-distributed random variables, . . . ^ yC-^+i) g^^e i/-distributed 

random variables, V = (Vi, . . . , Vs+i) is a Dirichlet-distributed random vector with param- 
eter . . . , the X^''\ Y^''^ and V being all independent of each other. Similarly, let K2 
be defined by 



K2 : M2 {B) X M2 (C) 



s+i 



,fc=l A;=c+1 

A simple computation shows that if (/x, u) E M.2 (B) x A^2 (C), then 

(a + l)5 + 6C 



Eis:i(/i,i^) 



m + 1 
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and 



cB + {d+ 1)C 



m + 1 

so that, since m = a — c = d — b, the relation cB + bC = is a sufficient and necessary 
condition for the product function {Ki, K2) to range M.2 {B) x M.2 iC) into itself. 

Lemma 2 Let B and C he real numbers that satisfy cB + bC = 0. Then, the smoothing 
transform 

K : M2{B) X M2{C) M2{B) X M2{C) 

I — > (Ki{^,u),K2{f^,u) 



IS 



5+1 



2m- 



j-Lipschitz. In particular, it is a contraction. 



Theorem 7 (i) When B and C are real numbers that satisfy cB + bC = 0, System (15) 
has a unique solution in M.2 {B) x M.2 (C)- 



(ii) The pair [X^'^ ^Y^'^^ is the unique solution of the distributional System (15) having 



as expectation and a finite second moment. 



Theorem [7] is a direct consequence of Lemma [2] and of Banach's fixed point theorem. 

Proof of Lemma [2} Let (/ii, vi) and {112,1^2) in A^2 {B) xM2{C). Let K = (Vi, . . . , V^+i^ 
be a Dirichlet random vector with parameter (|;, . . . , ^) . Let x''^\ . . . , x'f^^'' be /ii-distributed 
random variables, 



(a+2) 



^ y-^('S'+i) z^i-distributed random variables, x'^\ . . . ., xf^ be 

.(5+1) 



/i2-distributed random variables and Y2'^^^\ . . . , Yg*'^^^'' be t'2-distributed random variables, 
all of them being independent and independent of V . Then, 

a+l 5+1 



Var 



rf;^(i^l(/il,^l),i^l(/X2,^2)) < Y.^u(x^^-X^^)+Y.^^'{^i''-'^i'^) 

k=l k=a+2 

_k=l k=a+2 

^ E Var {y, Vk [Xf' - Xf') + E Vk {y['' - Y^' 

V k=l k=a+2 
/ a+l 5+1 

+ Var E 5^ V,- (xf ^ - xf^) + ^ V.^ (y}''> - 



V 



k=a+2 



thanks to the law of total variance. Since V = [Vi, . . . , Ks+i) is independent of the X-'^^ and 
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of the Yj^\ one gets 



dw {k, V,) , K, V,)) <J2 ^Vt Var (xf ^ - xf^) + ET/,^'^ Var {y}'^ - Y^'^) 

k=l k=a+2 

< Var (X« - X^^) Y^mt + Var (f/^^ - vi'^) 



A;=a+2 



a + 1 
2m + 1 



1 2 2m + 1 



Since the inequality holds for any random variables x'^\ x'^\ Y"/^^ and Yg*"^'' having respective 
distributions /^i, /i2, and z/2, this leads to 

^11/(^1 (/ii,Z/i) ,Ki (/i2,Z/2)) < ^ C?vp^ (/ii,/i2)^ + ^ ^, , C^V[^('^l,'^2)^ 

V / 2m + 1 2m + 1 



- 2m + l V^^^'''^-''^^'''''^ 



'^^ ^ t/f ,(/X2,^^2)) , 



2m + 1 



A very similar computation shows that 

dw (/il, I^l) , K2 (yU2, ^^2) ) < 

so that, finally, 

d(K (/il, z/i) , i^' (/i2, z/2) ) < TT-^TT^i (/^i' '^i) ' ('"s, z/2 
V / 2m + 1 V 

making the proof complete. Note that the assumption = ^ > ^ guarantees that the 
Lipschitz constant is in ]0, 1[. ■ 



4.3 Contraction method in continuous time 

In continuous time, the laws of X^^ and Y'-^^ are solutions of the following system (cf. (18)): 



/ a+l 5+1 s 

^ k=l k=a+2 ' 

/ c 5+1 s 

^ ^ fc=l fc=c+l / 



The following theorem, which is the continuous time version of Theorem [7| can be proved 
by two different ways. One can combine Theorem [7] with the connexion established in 
Proposition [Tj Alternatively, one can adapt the arguments of Theorem [7] to make a direct 
proof. Details are left to the reader. 
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Theorem 8 (i) When B and C are real numbers that satisfy cB + hC = 0, System (18) 
has a unique solution in M.2 (B) x A^2 (C). 

(a) The pair (X*^^,y^^) is the unique solution of the distributional System (18) having 
(f'~f) expectation and a finite second moment. 



5 Moments 

This section is devoted to the asymptotics of the moments of the limit variables W^'^ and 
W'"'^ . We shall see that they are big but not too much. Observe first that the connexion ( 10 ) 



allows us to study only one of the two cases among discrete or continuous case. We chose to 
focus on the continuous case, since the fixed point equation system is slightly easier to deal 



with. Let us recall here system (18). 



/ a+l S+l N 

V k=l k=a+2 ^ 



+2 
5+1 



^ fc=l fc=c+l ^ 



where \J is uniform on [0, 1], where X, X'^^'^ and F, Y'^^'^ are respective copies of X*^-^ and 
being independent of each other and of f/. 

Up to now, what is known about the size of these moments is contained in where it is 
proved that the radius of convergence of the Laplace series of a non trivial square integrable 



solution of (18) is equal to zero. Consequently, by the Hadamard formula for the radius of 
convergence, 

/E|X|P\^ 
limsup ; — = 

V VP'/ 
In otherwords, for any constant C, for p large enough. 



-oo. 



C < 



E|X|P 
p\ 



The following lemma gives an upperbound for ' . It is the argument leading to Theorem 9 
where it is proved that the law of X is determined by its moments. 



Lemma Z If X and Y are integrable solutions of (18), they admit absolute moments of all 



orders p>l and the sequences 



E|X|P 
p\ log^ p 



and 



^Y\p 

p\ \ogP p 



are bounded. 
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Proof. Let (p{p) := log^(p + 2) and define 



E\X\f E\Y\P 
Up '■= —. — — and f„ := 



We show by induction on p >lthat (gg)^ and (gg)'^ are finite and define bounded 
sequences. Notice that a similar technique is used in Kahane-Peyriere [12]. Take the power 

p in the first equation notice that KU"^^ = , and isolate the two extreme terms. One 

mp + 1 

gets (remember S' + l = a+ l + 6) 

E\X\P < — - — I (a + 1)E|X|^' + 6E|F| 



mp + 1 



+ V E|X|Pi...E|X|P''+iE|y|P»+2...E|F|Ps+i 

Pj<p-i 

or also 

(mp-a)E\X\P <bE\Y\P + V -E\X\P' . . .E\X\f'^+'E\Y\P'^+^ . . .E\Y\p^+\ 

Pi+-+Ps+i=P 

An analog inequality holds for Ell^l'', leading to the system 



{mp -a)up< bvp + ^ Up, . . . Up^^.Vp^^^ . . . Vp^^^ ■ 

piH hps+l=P r'y^J 

Pj<p-1 

(19) 

(mp - < cUp + ^ Up, . . . Up^Vp^^, . . . Vp^^^ . 

Pi+---+Ps+i=P 
Pj<p-1 

Since the eigenvalues of the matrix R = ^ ^ are m and S and since 2m > S (the urn 

is assumed to be large), all matrices mp/2 — R {p > 2) are invertible so that System (19) 
implies by induction on p that solutions X and Y of System (18) admit absolute moments 
of all orders as soon as they are integrable. 

Let po be the smallest positive integer such that for any p > po, 

m{p — 1) / \ "5+1 



■(^l + 81og(p + 2)) <1. 



{mp — a) {mp — d) — he 
Such a Po exists since the left handside goes to when p goes to +00. Denote 



A := max <i {uq)i , {vq)"^ . 



i<g<po 
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Assume by induction on p > po + 1 that for every q < p — 1, {uq) « < A and {vq) < A. Then, 



{mp — a)up < bvp + AP 



piH hps+i=P 

Pj<p-l 



{mp — d)vp < cUp + A^ 



piH hps+i=P 

Pi<P-l 



(p{pi) . . . ^{ps+l 



Let 



V?(pi) . ■ ■ ^{ps+i) 



so that 



which imphes 



PiH l-Ps+i=P 

Pj<P-l 



{mp — a)up < bvp + 
{mp — d)vp < CUp + A^^{p) 

^ m{p — 1) A^<^{ 

^ ~ {mp — a) {mp — d) — he 



and the same inequahty for Vp as welL Admit for a while the following lemma. 



Lemma 4 For every p >2, < ^1 + 8 log (p + 2) ^ 



5+1 



\ S+1 



Consequently 

^ ~ {mp — a) {mp — d) — he ( ~'~ S (P + ) j 
By definition of po, this implies that {up)p < A and the recurrence holds. 



Proof of Lemma [4j The definitions of (p and $ imply directly that 

log^^ (pi + 2) . . . log^«+i (p5+i + 2) 



$(p) 



E 



PiH l-ps+i=P 

Pj<P-l 



log^ (p + 2) 



(20) 



E 



log 1 



p-pi 



pi 



log 1 



P-P5+1 
p+2 



PS+1 



PlH hPs+l=P 

Pj<P-l 



log (p + 2) 



log (p + 2) 
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Using log(l — u) < — M for all m < 1 leads to 



*'^'^...£..J'-(™¥Tiy)'"-( 



p-pi \ L p-ps+i \ 



piH \-ps+i=P 

Pj<p-1 



(p + 2)log(p + 2). 



which can be written with an exponential to get, using again log(l — u) < —u: 

~ p.^..k..^p ^ 1 (P + 2) log (p + 2) ^ A P 

Pj<p-i 

( P^ \ 

Let := exp — — — x) , so that 

^ \ (p + 2)log(p + 2) ^ V 



PiH l-Ps+i=P 

Pj<p-\ 



0<pi,...,ps+i<p-l 

^p-1 



.fc=0 



P 



Elementary calculations lead then to 



and for any a > 



so that 



k=0 



exp (—ax(l — x)) dt < — 

a 



and the lemma holds. 

The upperbound on the moments, obtained in Lemma [3] leads to the following theorem. 



Theorem 9 Let X and Y he integrable solutions of any fixed point equation (15) or (18). 
Then, X and Y admit absolute moments of all orders p>l and the probability distributions 
of \X\, \Y\, X and Y are determined by their moments. 
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Proof. By Lemma |3| if X and Y are integrable solutions of (18), they admit moments of 
all orders and, when p is large enough, 



(E|X| 



-p >C 



\ogp 



(21) 



Besides, by Stirling's formula, when p tends to infinity. 



{p\ 



logp plogp 

which is the general term of a Bertrand divergent series. The Carleman's criterion applies, 
implying that X and Y are moment determined. 



If X and Y are integrable solutions of (15) and if ^ is an independent Gamma (|;) -distributed 
random variable, then, thanks to Proposition [T| ^X and ^Y are integrable solutions of (18) 
so that they both satisfy Carleman's criterion. This implies that X and Y are moment 
determined as well. ■ 



Corollary 1 For any initial composition the limit laws one? of a large 

Polya urn process are determined by their moments. 



Proof. For elementary initial compositions (1, 0) or (0, 1), the result is a direct consequence 
of Theorems [TI M and M For a general initial composition (a, (3) in continuous time, notice 



that decomposition Formula (16) implies that 



W^(S)llp<«ll^a;o)llp + /3||W^(o'i) 



CT 



tCT 



IP- 



Since W^(io) ^^"^ ^(oT) satisfy (21), W^^^-^ satisfies Carleman's criterion; it is thus deter- 
mined by its moments. The same arguments hold in discrete time, using decomposition 



Formula (12). 



6 Appendix: Polya urns and Dirichlet distribution 

In this section, we deal with results that belong to the "folklore": they are not new neither 
very difficult, but are nowhere properly gathered, to the best of our knowledge. Proposition |2] 
goes back to Athreya |3] with different names and a different proof. It is partially given in 
Blackwell and Kendall [8] for S = \ and starting from one ball of each color. The moment 
method is evocated in Johnson and Kotz book [18]. We detail here a proof to make our 
paper self-contained. 
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6.1 Dirichlet distributions 



This section gathers some well known facts on Dirichlet distributions. Besides, we fix nota- 
tions we use in the sequel. 

Let d > 2 be a natural integer. Let E be the {d — 1) -dimensional simplex 

E= |(xi,...x<i) e [0,1]^ J]a;fc = l 

I k=l 

The following formula is a generalization of the definition of Euler's Beta function: let 
{ui, . . . , Ud) be positive real numbers. Then, 



n 



X 



fc=l 



dE {xi, ...,Xd) 



(22) 



where denotes the positive measure on the simplex E, defined by 
/ {xi, ...,Xd)dJ: {xi, ...,Xd) 

= f (xi, . . .,Xd-i, 1 - ELi^fe) '^{xe[o,i]''-\ j:tzl^k<i}^^^ ■ --^^d-i 
for any continuous function / defined on E. 

By means of this formula, one defines usually the Dirichlet distribution with parameters 
(i/i, . . . , denoted by Dirichlet {ui, ... ,1/0), whose density on E is given by 



T(u, + ■ 



r(^/i)...r(z/,) 



n 

.k=l 



X 



dE {xi,...,Xd) 



In particular, ii D = {Di, . . . , Dd) is a d-dimensional random vector which is Dirichlet- 
distributed with parameters (i/i, . . . , i/^), then, for any p = {pi,...,Pd) G N"*, the (joint) 
moment of order p of D is 



E(L>^') =E(L>f ...L>^<^) 



T{u+\p\) 



n 

fe=i 



r {i^k + Pk) 

Tii^k) 



where u = Ylk=i and \p\ = Y!k=iPk- 

Finally, a computation of same kind shows that the [0, l]-valued random variable Dk, which 
is the k-ih. marginal distribution of D, is Beta {uk, u — i/^) -distributed i.e. admits the density 



B (i/jfc, V - i/fe) 



1 [0,1] (it. 



Note that computing asymptotics of such moments when p tends to infinity by Stirling's for- 
mula leads to show that a Dirichlet distribution is determined by its moments. An alternative 
description of a Dirichlet distribution can be made by considering a sequence (Gi, . . . , Gd) 
of Gamma-distributed random variables conditioned to the relation Yli'k=i = 1- 
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6.2 Original/diagonal Polya urns 

Proposition 2 Let d > 2 and S > 1 be integers. Let also . . . , a^) € \ {0}. Let 
{Pn)n>o the d-color Polya urn random process having Sid as replacement matrix and 
(cKi, . . . , ctrf) as initial composition. Then, almost surely and in any 1} , t>l, 

— — >V 

nS n-loo 

where V is a d-dimensional Dirichlet- distributed random vector, with parameters . . . , 



Remcirk 2 For any k E {1, . . . , d}, the k-th coordinate ofV is Beta y^, Xlj^fc ^ j -distributed. 

Proof. We give here a short autonomous proof. Denote a — Ylk=i > 1- Conditional 
expectation at time n + 1 writes 

IE {Pn+l \ J^n) — ~ 7r~Pn 

a + nS 

so that { a^Zs )n>o ^ t^' ll'^'^^^^^^d convergent martingale with mean {ai/a, . . . , aa/a); let 
V be its limit. If / is any function defined on R'^, 

E(/(P„«)|^„)=(/ + ^) (/)(P„) 

where 

d 

Hf)(v)^J2'^k[f(v + Se,)-f(v)] 

k=l 

(cfc is the k-th vector in M.'^ canonical basis and v = Ylt^i'^kek)- In particular, as can be 
straightforwardly checked, if p = (pi, ■ ■ ■ ,Pd) G and \p\ — Ylk=iPkj ^^e function 

k=l \ s) 

defined on R'^, is an cigcnfunction of the operator $, associated with the eigenvalue \p\S. 
Consequently, after a direct induction, for any p e N'^, 

^^^^^^ r(| + H) 

so that, when n tends to infinity, by Stirling's formula, 
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Besides, expanding real polynomials X'^ — X^^ . . . X^'^ in the basis {Tp)p^^d, one gets formu- 
lae 

|fe|<b|-i 

where the ap^k ^-re rational numbers. Consequently, when n tends to infinity, one gets the 
asymptotics 

which implies that, for any p e N'', 

^ ^ r(f + H)ll r(f) ^ ^ 

Note that this proves the convergence of the martingale in L* for all t > 1. Since a Dirich- 
let distribution is determined by its moments, this shows that the law of is a Dirichlet 
distribution with parameters . . . , ^) . ■ 
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